Measuring XML Structured-ness with Entropy

نویسندگان

  • Ruiming Tang
  • Huayu Wu
  • Stéphane Bressan
چکیده

XML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structuredness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Querying Semi-structured Data with Mutual Exclusion

Data analytics applications, content-based collaborative platforms and office applications require the integration and management of current and historical data from heterogeneous sources. XML is a standard data format for information. Thanks to its semi-structured-ness, it is a good candidate data model for the integration and management of heterogeneous content. However, the management of his...

متن کامل

Dissipative Transport

Nonequilibrium stationary states (NESS) describe the state of a mechanical system driven and maintained out of equilibrium by external forces. The main characteristic of a NESS is that it sustains sustain steady flows or equivalently that it exhibits positive entropy production. We discuss general features of the fluctuations of the entropy production and corresponding flows. The NESS and the e...

متن کامل

Measuring Complexity of Domain Standard Specifications Using XML Schema Entropy

XML schemas are used extensively in e-commerce standardization initiatives. Such XML-based standards define the structure and the semantics of messages that are used to implement business transactions in a particular industry domain (e.g. travel). The design of the document structures that form the message payloads is of key importance as once the specification is published it is difficult to r...

متن کامل

Measuring Qualities of XML Schema Documents

The Extensible Markup Language (XML) is becoming a de-facto standard for exchanging information among the web applications. Efficient implementation of web application needs to be efficient implementation of XML and XML schema document. The quality of XML document has great impact on the design quality of its schema document. Therefore, the design of XML schema document plays an important role ...

متن کامل

Document Type Definition (DTD) Metrics

In this paper, we present two complexity metrics for the assessment of schema quality written in Document Type Definition (DTD) language. Both “Entropy (E) metric: E(DTD)” and “Distinct Structured Element Repetition Scale (DSERS) metric: DSERS(DTD)” are intended to measure the structural complexity of schemas in DTD language. These metrics exploit a directed graph representation of schema docum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011